Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models

نویسندگان

  • Prafulla Choubey
  • Shubham Pateria
چکیده

This paper describes aggregated learning models for Complex Word Identification (CWI) task in SemEval 2016. The work focused on selecting the features that determine complexity of words and used different combinations of support vector machine (SVM) and decision tree (DT) techniques for classification. These classifiers were pipelined with pre-processing and postprocessing blocks which helped improving accuracy of systems, though had little impact on recall. Four systems were evaluated on the test set; SVM and DT systems by team Bhasha achieved G score of 0.529 and 0.508 respectively and SVM&DT and SVMPP systems by team Garuda achieved G scores of 0.360 and 0.546 respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sensible at SemEval-2016 Task 11: Neural Nonsense Mangled in Ensemble Mess

This paper describes our submission to the Complex Word Identification (CWI) task in SemEval-2016. We test an experimental approach to blindly use neural nets to solve the CWI task that we know little/nothing about. By structuring the input as a series of sequences and the output as a binary that indicates 1 to denote complex words and 0 otherwise, we introduce a novel approach to complex word ...

متن کامل

UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification

In this paper, we present our system developed for the SemEval 2016 Task 11: Complex Word Identification. Our team achieved the 3rd place among 21 participants. Our systems ranked 4th and 13th among 42 submitted systems. We proposed multiple features suitable for complex word identification, evaluated them, and discussed their properties. According to the results of our experiments, our final s...

متن کامل

IIIT at SemEval-2016 Task 11: Complex Word Identification using Nearest Centroid Classification

This paper describes the system that was submitted to SemEval2016 Task 11: Complex Word Identification. It presents a preliminary investigation into exploring word difficulty for non-native English speakers. We developed two systems using Nearest Centroid Classification technique to distinguish complex words from simple words. Optimized over G-score, the presented solution obtained a G-score of...

متن کامل

CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification

This paper describes the system deployed by the CLaC-EDLK team to the SemEval 2016, Complex Word Identification task. The goal of the task is to identify if a given word in a given context is simple or complex. Our system relies on linguistic features and cognitive complexity. We used several supervised models, however the Random Forest model outperformed the others. Overall our best configurat...

متن کامل

USAAR at SemEval-2016 Task 11: Complex Word Identification with Sense Entropy and Sentence Perplexity

This paper describes an information-theoretic approach to complex word identification using a classifier based on an entropy based measure based on word senses and sentence-level perplexity features. We describe the motivation behind these features based on information density and demonstrate that they perform modestly well in the complex word identification task in SemEval-2016. We also discus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016